It took me a few days, but I was able to reverse engineer the CT Decompression routine.  The mammoth block of assembly code it consists of contains 4 copies of the routine.  They vary only slightly by which banks they copy the data to and whether the bytes to make multiple copies of are far or near.  

Overall, its not nearly as interesting or efficient of a compression scheme as some others are.  Then again, it does have to decompress it in real time using a 3.58 MHz processor.

I think its a good idea to keep ROM information as open as possible, so I'm distributing my source for this routine.  This will probably only be of interest to developers or people wanting to learn the SNES's assembly.  It has most of the ASM interspersed in the code as comments.  It is written in C# which is sort of a cross between C++ and Visual Basic.

Evil Peer, evilpeer@hotmail.com



Recompression Algorithm Explanation

It took even more time to reverse engineer the recompression algorithm.  To decompress, I didn't have to understand the format, I just had to understand the ASM code.  Recompression required full understanding.  This code was reverse engineered independant of other resources.

Here is how the compression is laid out.

The first two bytes are an unsigned short that determines the length of the main body of compressed data.  The main body consists of a number of compressed packets.  

Each packet starts with a single byte header that determines which bits are compressed and which isn't.  Each bit, starting from the least significant (or 0x01) is a flag indicating whether compression is used or not.  Each compressed data stream uses two bytes and each uncompressed byte is obviously just one.  For example, a packet with a header of 23 would look like 23CC CCCC CCUU UUUU CCCC UUUU.  In a roundabout way, the header also determines the length of the packet, as the length is nine plus the number of bits set.

Each two byte compressed stream is an unsigned short.  The most significant bits indicate the number of bytes to copy and the least significant bits indicate the offset of the data to copy (counting backwards).  How many bits each uses depends on the maximum copy bytes and the data range.  It will either be 0xF800 for the copy bytes and 0x07FF for the range or 0xF000 and 0x0FFF.  To get the number of bytes to copy, add three to the down-shifted number (so a maximum of 34 and 18 respectively).

After the main body is a byte that determines if there are any addendums and what address range the compression packets use.  If bits 0x80 or 0x40 are set, the compressed data streams use a 0x07FF range.  The remaining bits (0x3F) are used to determine the length of the addendum (if any).

After this postcedant byte is another unsigned short that determines the length of everything up to that point plus the length of this addendum.  Sort of a "final" compressed length.

Typically, the addendum is anything that doesn't fit neatly into an eight bit packet header (for example, if the last packet only uses 3 bits).  There is no technical reason why there can't be more than one addendum, but I can't fathom why more than one would be used.  The packet header for the addendum must have any unused bits set (though I can't determine why).  So if the addendum only has one uncompressed byte, the packet header for the addendum should be 0xFE.

The final byte of the compressed data should have bits 0x3F cleared (set to 0).


Versions
20031216
- Discovered that the game is picky about the addendum's packet header

20031215
- Minor bug fix to recompression code that omitted final byte
- Minor correction to format explanation concerning final byte

20031204-B
- Minor bug fixes for problems resulting from combining short and long range code

20031204-A
- Recompression code
- Compression format explanation

20030625
- Decompression routine with ASM breakdown